AITopics | biomedical dataset

Collaborating Authors

biomedical dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Framework for Data-Centric Biomedical Natural Language Processing Jason Alan Fries 1 Leon Weber

Neural Information Processing SystemsAug-17-2025, 09:35:14 GMT

Prompting offers new opportunities for constructing meta-datasets that capture desirable language reasoning skills. In the general NLP domain, data-centric methods have benefited from community efforts such as Hugging Face's datasets hub [

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre: Research Report (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Add feedback

Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models

Ge, Yao, Das, Sudeshna, Guo, Yuting, Sarker, Abeed

arXiv.org Artificial IntelligenceAug-12-2025

Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data). In this article, we address the performance challenges of LLMs for few-shot biomedical NER by investigating a dynamic prompting strategy involving retrieval-augmented generation (RAG). In our approach, the annotated in-context learning examples are selected based on their similarities with the input texts, and the prompt is dynamically updated for each instance during inference. We implemented and optimized static and dynamic prompt engineering techniques and evaluated them on five biomedical NER datasets. Static prompting with structured components increased average F1-scores by 12% for GPT-4, and 11% for GPT-3.5 and LLaMA 3-70B, relative to basic static prompting. Dynamic prompting further improved performance, with TF-IDF and SBERT retrieval methods yielding the best results, improving average F1-scores by 7.3% and 5.6% in 5-shot and 10-shot settings, respectively. These findings highlight the utility of contextually adaptive prompts via RAG for biomedical NER.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.06504

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Charlotte Bunne on developing AI-based diagnostic tools

AIHubFeb-20-2025, 15:37:36 GMT

Charlotte Bunne, head of EPFL's Artificial Intelligence in Molecular Medicine Group, is developing AI algorithms to better understand the incredibly complex and high-dimensional data that represent the hundreds of tissue layers and protein markers in an individual cell. EPFL magazine Dimensions spoke to Charlotte Bunne about her work at the cutting-edge of AI in medicine and biology. Could you describe the focus of your research? We are developing diagnostic tools for clinics that are driven by AI technologies. This includes forecasting the best treatment that a patient should receive, trying to understand the state of disease that a patient is in, and deciphering important biomarkers or potential drug targets that we should investigate further.

artificial intelligence, charlotte bunne, machine learning, (16 more...)

AIHub

Genre: Personal (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Educational Setting > K-12 Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Learning Ordinality in Semantic Segmentation

Cristino, Rafael, Cruz, Ricardo P. M., Cardoso, Jaime S.

arXiv.org Artificial IntelligenceJul-30-2024

Semantic segmentation consists of predicting a semantic label for each image pixel. Conventional deep learning models do not take advantage of ordinal relations that might exist in the domain at hand. For example, it is known that the pupil is inside the iris, and the lane markings are inside the road. Such domain knowledge can be employed as constraints to make the model more robust. The current literature on this topic has explored pixel-wise ordinal segmentation methods, which treat each pixel as an independent observation and promote ordinality in its representation. This paper proposes novel spatial ordinal segmentation methods, which take advantage of the structured image space by considering each pixel as an observation dependent on its neighborhood context to also promote ordinal spatial consistency. When evaluated with five biomedical datasets and multiple configurations of autonomous driving datasets, ordinal methods resulted in more ordinally-consistent models, with substantial improvements in ordinal metrics and some increase in the Dice coefficient. It was also shown that the incorporation of ordinal consistency results in models with better generalization abilities.

consistency, dataset, pixel, (15 more...)

arXiv.org Artificial Intelligence

2407.20959

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Portugal > Faro > Faro (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.69)
Transportation > Ground > Road (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Augmenting Biomedical Named Entity Recognition with General-domain Resources

Yin, Yu, Kim, Hyunjae, Xiao, Xiao, Wei, Chih Hsuan, Kang, Jaewoo, Lu, Zhiyong, Xu, Hua, Fang, Meng, Chen, Qingyu

arXiv.org Artificial IntelligenceJun-18-2024

Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. In this paper, we proposed GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. Specifically, we performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with multiple additional BioNER datasets. Specifically, our models consistently outperformed the baselines in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight biomedical entity types sourced from five different corpora. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset.

bioner dataset, dataset, general-domain ner dataset, (15 more...)

arXiv.org Artificial Intelligence

2406.10671

Country:

Europe > United Kingdom (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement

Dong, Hang, Chen, Jiaoyan, He, Yuan, Horrocks, Ian

arXiv.org Artificial IntelligenceSep-1-2023

Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases (KB), e.g., ontologies and taxonomies. Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery; (ii) only using the concept label as the input along with the KB and thus lacking the contexts of a concept label; and (iii) mostly focusing on concept placement w.r.t a taxonomy of atomic concepts, instead of complex concepts, i.e., with logical operators. To address these issues, we propose a new benchmark, adapting MedMentions dataset (PubMed abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases sub-category and the broader categories of Clinical finding, Procedure, and Pharmaceutical / biologic product. We provide usage on the evaluation with the dataset for out-of-KB mention discovery and concept placement, adapting recent Large Language Model based methods.

biomedical dataset, concept discovery and placement, ontology enrichment

arXiv.org Artificial Intelligence

doi: 10.1145/3583780.3615126

2306.14704

Genre: Research Report (0.40)

Industry: Health & Medicine (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.60)

Add feedback

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

He, Yuan, Chen, Jiaoyan, Dong, Hang, Jiménez-Ruiz, Ernesto, Hadian, Ali, Horrocks, Ian

arXiv.org Artificial IntelligenceJul-22-2023

Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new Bio-ML track at OAEI 2022.

artificial intelligence, machine learning, mapping, (14 more...)

arXiv.org Artificial Intelligence

2205.03447

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Lin, Weixiong, Zhao, Ziheng, Zhang, Xiaoman, Wu, Chaoyi, Zhang, Ya, Wang, Yanfeng, Xie, Weidi

arXiv.org Artificial IntelligenceMar-13-2023

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.0724

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.89)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.70)

Add feedback

A Novel Weighted Combination Method for Feature Selection using Fuzzy Sets

Shen, Zixiao, Chen, Xin, Garibaldi, Jonathan M.

arXiv.org Machine LearningMay-21-2020

In this paper, we propose a novel weighted combination feature selection method using bootstrap and fuzzy sets. The proposed method mainly consists of three processes, including fuzzy sets generation using bootstrap, weighted combination of fuzzy sets and feature ranking based on defuzzification. We implemented the proposed method by combining four state-of-the-art feature selection methods and evaluated the performance based on three publicly available biomedical datasets using five-fold cross validation. Based on the feature selection results, our proposed method produced comparable (if not better) classification accuracies to the best of the individual feature selection methods for all evaluated datasets. More importantly, we also applied standard deviation and Pearson's correlation to measure the stability of the methods. Remarkably, our combination method achieved significantly higher stability than the four individual methods when variations and size reductions were introduced to the datasets.

artificial intelligence, fs method, machine learning, (14 more...)

arXiv.org Machine Learning

2005.05003

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.72)

Add feedback